seating arrangement
Tractable Weighted First-Order Model Counting with Bounded Treewidth Binary Evidence
Kůla, Václav, Kuang, Qipeng, Wang, Yuyi, Wang, Yuanhong, Kuželka, Ondřej
The Weighted First-Order Model Counting Problem (WFOMC) asks to compute the weighted sum of models of a given first-order logic sentence over a given domain. Conditioning WFOMC on evidence -- fixing the truth values of a set of ground literals -- has been shown impossible in time polynomial in the domain size (unless $\mathsf{\#P \subseteq FP}$) even for fragments of logic that are otherwise tractable for WFOMC without evidence. In this work, we address the barrier by restricting the binary evidence to the case where the underlying Gaifman graph has bounded treewidth. We present a polynomial-time algorithm in the domain size for computing WFOMC for the two-variable fragments $\text{FO}^2$ and $\text{C}^2$ conditioned on such binary evidence. Furthermore, we show the applicability of our algorithm in combinatorial problems by solving the stable seating arrangement problem on bounded-treewidth graphs of bounded degree, which was an open problem. We also conducted experiments to show the scalability of our algorithm compared to the existing model counting solvers.
RiddleBench: A New Generative Reasoning Benchmark for LLMs
Halder, Deepon, Saji, Alan, Jayakumar, Thanmay, Puduppully, Ratish, Kunchukuttan, Anoop, Dabre, Raj
Large Language Models have demonstrated strong performance on many established reasoning benchmarks. However, these benchmarks primarily evaluate structured skills like quantitative problem-solving, leaving a gap in assessing flexible, multifaceted reasoning abilities that are central to human intelligence. These abilities require integrating logical deduction with spatial awareness and constraint satisfaction, which current evaluations do not measure well. To address this, we introduce RiddleBench, a benchmark of 1,737 challenging puzzles in English designed to probe these core reasoning capabilities. Evaluation of state-of-the-art models on RiddleBench shows fundamental weaknesses. Even top proprietary models like Gemini 2.5 Pro, o3, and Claude 4 Sonnet achieve accuracy just above 60% (60.30%, 63.37%, and 63.16%). Analysis further reveals deep failures, including hallucination cascades (accepting flawed reasoning from other models) and poor self-correction due to a strong self-confirmation bias. Their reasoning is also fragile, with performance degrading significantly when constraints are reordered or irrelevant information is introduced. RiddleBench functions as a diagnostic tool for these issues and as a resource for guiding the development of more robust and reliable language models.
Dispersion of personal spaces
Horáček, Jaroslav, Rada, Miroslav
There are many entities that disseminate in the physical space - information, gossip, mood, innovation etc. Personal spaces are also entities that disperse and interplay. In this work we study the emergence of configurations formed by participants when choosing a place to sit in a rectangular auditorium. Based on experimental questionnaire data we design several models and assess their relevancy to a real time-lapse footage of lecture hall being filled up. The main focus is to compare the evolution of entropy of occupied seat configurations in time. Even though the process of choosing a seat is complex and could depend on various properties of participants or environment, some of the developed models can capture at least basic essence of the real processes. After introducing the problem of seat selection and related results in close research areas, we introduce preliminary collected data and build models of seat selection based on them. We compare the resulting models to the real observational data and discuss areas of future research directions.
Filling a theatre in times of corona
Blom, Danny, Pendavingh, Rudi, Spieksma, Frits C. R.
Submitted to INFORMS Journal on Applied Analytics manuscript (Please, provide the manuscript number!) Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named journal. INFORMS journal templates are for the exclusive purpose of submitting to an INFORMS journal and should not be used to distribute the papers in print or online or to submit the papers to another publication. In this paper, we introduce an optimization problem posed by the Music Building Eindhoven (MBE) to deal with the economical consequences of the COVID-19 pandemic for theatre halls. We propose a model for maximizing the number of guests in a theatre hall that respects social distancing rules, and is based on trapezoid packings. Computational results show that up to 40% of the normal capacity can be used for a single show setting, and up to 70 % in case artists opt for two consecutive performances per evening. All around the world, the corona-crisis has hit the cultural sector hard. Festivals are cancelled, orchestra's are at the brink of bankruptcy, choirs have stopped performing, and theatres are struggling to survive.
Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering
Sato, Issei, Tanaka, Shu, Kurihara, Kenichi, Miyashita, Seiji, Nakagawa, Hiroshi
We developed a new quantum annealing (QA) algorithm for Dirichlet process mixture (DPM) models based on the Chinese restaurant process (CRP). QA is a parallelized extension of simulated annealing (SA), i.e., it is a parallel stochastic optimization technique. Existing approaches [Kurihara et al. UAI2009, Sato et al. UAI2009] and cannot be applied to the CRP because their QA framework is formulated using a fixed number of mixture components. The proposed QA algorithm can handle an unfixed number of classes in mixture models. We applied QA to a DPM model for clustering vertices in a network where a CRP seating arrangement indicates a network partition. A multi core processor was used for running QA in experiments, the results of which show that QA is better than SA, Markov chain Monte Carlo inference, and beam search at finding a maximum a posteriori estimation of a seating arrangement in the CRP. Since our QA algorithm is as easy as to implement the SA algorithm, it is suitable for a wide range of applications.
Restricted Collapsed Draw: Accurate Sampling for Hierarchical Chinese Restaurant Process Hidden Markov Models
Makino, Takaki, Takei, Shunsuke, Sato, Issei, Mochihashi, Daichi
We propose a restricted collapsed draw (RCD) sampler, a general Markov chain Monte Carlo sampler of simultaneous draws from a hierarchical Chinese restaurant process (HCRP) with restriction. Models that require simultaneous draws from a hierarchical Dirichlet process with restriction, such as infinite Hidden markov models (iHMM), were difficult to enjoy benefits of \markerg{the} HCRP due to combinatorial explosion in calculating distributions of coupled draws. By constructing a proposal of seating arrangements (partitioning) and stochastically accepts the proposal by the Metropolis-Hastings algorithm, the RCD sampler makes accurate sampling for complex combination of draws while retaining efficiency of HCRP representation. Based on the RCD sampler, we developed a series of sophisticated sampling algorithms for iHMMs, including blocked Gibbs sampling, beam sampling, and split-merge sampling, that outperformed conventional iHMM samplers in experiments
Improvements to the Sequence Memoizer
The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the mysterious" coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements."